Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add unsorted decompressed chunk path even if we have sorted ones #6879

Open
wants to merge 197 commits into
base: main
Choose a base branch
from

Conversation

akuzm
Copy link
Member

@akuzm akuzm commented May 3, 2024

The unsorted paths are better for hash aggregation, but currently if we're doing aggregation and we can push down the sort, we are only going to add sorted paths.

Fixes #6836
Fixes #7084

akuzm added 30 commits May 3, 2024 17:03
The unsorted paths are better for hash aggregation, but currently in
this case we are only going to add sorted paths.
Add ANALYZE. To keep the desired MergeAppend plans, we also have to add
a LIMIT everywhere so that the MergeAppend is chosen based on its lower
startup cost. Otherwise the plain Sort over Append will be chosen
because for small tables its cost is less.
Add ANALYZE after compression. The plan changes are expected, SeqScans
are preferred over IndexScans and Sort over MergeAppend for small
tables.
We would add extra Sort nodes when adjusting the children of space
partitioning MergeAppend under ChunkAppend. This is not needed because
MergeAppend plans add the required Sort themselves, and in general no
adjustment seems to be required for the MergeAppend children
specifically there.
src/planner/planner.h Outdated Show resolved Hide resolved
Comment on lines +1619 to +1624
Group Key: _hyper_31_114_chunk.device_id
-> Sort
Sort Key: _hyper_31_114_chunk.device_id
-> Gather
Workers Planned: 2
-> Parallel Append
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be GatherMerge above Sort, will be addressed here: #7547

Comment on lines -933 to -942
/*
* Check if this path is parameterized on a compressed
* column. Ideally those paths wouldn't be generated
* in the first place but since we create compressed
* EquivalenceMembers for all EquivalenceClasses these
* Paths can happen and will fail at execution since
* the left and right side of the expression are not
* compatible. Therefore we skip any Path that is
* parameterized on a compressed column here.
*/
Copy link
Member Author

@akuzm akuzm Dec 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I fixed this some time ago, we shouldn't be creating EquivalenceMembers on compressed columns of compressed chunk table anymore because they don't make sense anyway. Removed this check and the tests for older issues still pass.

@svenklemm
Copy link
Member

svenklemm commented Dec 21, 2024

Did this have any effect on planning time with many compressed chunks?

@@ -200,8 +200,8 @@ generate_series(1,3) device;
Sort Method: top-N heapsort
-> Custom Scan (DecompressChunk) on _hyper_1_3_chunk (actual rows=30 loops=1)
Filter: (device = ANY ('{1,2,3}'::integer[]))
-> Index Scan using compress_hyper_2_6_chunk_device__ts_meta_min_1__ts_meta_max_idx on compress_hyper_2_6_chunk (actual rows=3 loops=1)
Index Cond: (device = ANY ('{1,2,3}'::integer[]))
-> Seq Scan on compress_hyper_2_6_chunk (actual rows=3 loops=1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like a regression? We have a constraint on the first index column so the index should be beneficial

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For small tables I think it happens often that the Seq Scan is chosen instead of Index Scan. E.g. we often see this change after adding analyze.

@akuzm
Copy link
Member Author

akuzm commented Dec 23, 2024

Did this have any effect on planning time with many compressed chunks?

There's a 5% regression on a couple of queries in the planning suite. I'll see if I can optimize this somehow. There were some changes in the "ordered_append_planning" suite, but it's actually an execution time change, I verified manually, and changed the queries to use explain.

https://grafana.ops.savannah-dev.timescale.com/d/fasYic_4z/compare-akuzm?orgId=1&var-branch=All&var-run1=3997&var-run2=4018&var-threshold=0.02&var-use_historical_thresholds=true&var-threshold_expression=2%20%2A%20percentile_cont%280.90%29&var-exact_suite_version=false&from=now-2d&to=now

The reason for the execution time change is that Gather Merge -> Sort -> Parallel Append -> DecompressChunk -> Parallel Seq Scan is chosen over Merge Append -> Sort -> DecompressChunk -> Seq Scan. This is because Postgres doesn't support Gather Merge -> Merge Append, as I mentioned on Slack before. Probably something we can improve in the future.

This happens for queries like SELECT * FROM space_part ORDER BY time DESC, a LIMIT 1;

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants